# NCTU-EE IC LAB - Spring 2025

# Midterm Project

# Design: Maze Router Accelerator (MRA)

## **Data Preparation**

1. Extract test data from TA's directory:

% tar xvf ~iclabTA01/Midterm ProjectS.tar

## **Design Description**

Wire routing, often referred to simply as "routing", is a pivotal stage in IC (Integrated Circuit) design. Following the placement phase, the routing process introduces the necessary wires to connect the components in accordance with the IC's design rules. All routing tools share a fundamental objective: they begin with predefined polygons representing terminals on cells. Each polygon is linked to a specific net, either by name or number. The router's main responsibility is to generate geometries that ensure terminals linked to the same net are connected. At the same time, terminals from different nets should remain unconnected, while adhering to all design guidelines.

In this project, you'll be developing a robust router called the "Maze Router". This router will handle multiple un-routed targets stored in DRAM. Your primary objective is to ensure that all terminals linked to a specific net are interconnected. Let's delve deeper into how the Maze Router operates.

Here is step by step illustration:

Step 0: Get Input & Fetch DRAM



Step 1: Filling Path Map



**Location Map** 

| frame_id | 0 |  |  |  |  |  |  |  |  |  |
|----------|---|--|--|--|--|--|--|--|--|--|
| net_id   |   |  |  |  |  |  |  |  |  |  |
| loc_x    |   |  |  |  |  |  |  |  |  |  |
| loc_y    |   |  |  |  |  |  |  |  |  |  |

source sink source sink

Path Map

"Wave Propagation" from S (source) With sequence "1,2,3,1,2,3,..." Until reached T (sink)

Step 2: Retrace Path Map

Since Down priority is higher than Left.



**Location Map** 

## **Retrace Priority**

- 1) Down (y+1) 2) UP (y-1)
- 3) Right (x+1) 4) Left (x-1)

Path Map

"Retrace" from T (sink)
Based on retrace priority to discover right back-trace sequence, i.e.
"3,2,1,3,2,1..." or "2,1,3,2,1,3..." or
"1,3,2,1,3,2..." Until reached S (source)

Step 3: Update Location Map



**Loop:** Until all targets routed



| frame_id |    | ( | ) |    |
|----------|----|---|---|----|
|          | 9  |   |   | 4  |
|          | 5  |   | 4 | 13 |
|          | 10 |   | 4 | 13 |

source sink

Do it again for another target until all targets find the path.

## Step 4: Calculate Path Cost



9 9 6 6 5 8 5 8 8 8 5 8 

8 8 8

8 9

9 9 5 6

**Location Map** 

Weight Map (store in DRAM)

5 5

Accumulate weight on the path from Source to Sink.

Ex: 8+7+6+5+7+7+6+5+5+6+7+8 = 77 Record it as target #1 path cost. (Total cost += target #1 cost )

## Loop2 >> Step 1: Filling Path Map

|  | 4 | 4 | 4 | 4 |   |   |   |   | 9 | 9 | 9 | 9 |  |
|--|---|---|---|---|---|---|---|---|---|---|---|---|--|
|  | 4 | 4 | 4 | 4 |   |   |   |   | 9 | 9 | 9 | 9 |  |
|  | 4 | 4 | 4 | 4 |   |   |   |   |   | 9 |   |   |  |
|  |   |   |   |   |   |   |   |   |   | 9 |   |   |  |
|  |   |   |   |   |   |   |   |   |   | 9 |   |   |  |
|  |   |   |   |   |   |   |   |   |   | 9 |   |   |  |
|  |   |   |   |   |   |   |   |   |   | 9 |   |   |  |
|  |   |   |   |   |   |   |   |   |   | 9 |   |   |  |
|  |   |   | 9 | 9 | 9 | 9 | 9 | 9 | 9 | 9 |   |   |  |
|  |   |   | 9 | 9 |   |   |   |   |   |   |   |   |  |
|  |   |   |   |   |   |   |   |   |   | 4 | 4 | 4 |  |
|  |   |   |   |   |   |   |   |   |   | 4 | 4 | 4 |  |
|  |   |   |   |   |   |   |   |   |   |   |   |   |  |
|  |   |   |   |   |   |   |   |   |   |   |   |   |  |

| 1 | 3 | 1 | 2 | 1 | 3 | 2        | 3        | 1 | 2 | 3 | 1 | 2 | 3        | 1 | 2              |
|---|---|---|---|---|---|----------|----------|---|---|---|---|---|----------|---|----------------|
|   |   |   | - | - | - | $\vdash$ | $\vdash$ |   | - |   |   |   | $\vdash$ | _ | $\vdash\vdash$ |
| 3 | 2 | 3 | 1 | 3 | 2 | 1        | 2        | 3 | 1 | 2 | 3 | 1 | 2        | 3 | 1              |
| 2 | 1 | 4 | 4 | 4 | 4 | 3        | 1        | 2 | 3 | 9 | 9 | 9 | 9        | 1 | 2              |
| 1 | 3 | 4 | 4 | 4 | 4 | 2        | 3        | 1 | 2 | 9 | 9 | 9 | 9        | 2 | 3              |
| 3 | 2 | 4 | 4 | s | 4 | 1        | 2        | 3 | 1 | 2 | 9 | 2 | 1        | 3 | 1              |
| 2 | 1 | 3 | 2 | 1 | 2 | 3        | 1        | 2 | 3 | 1 | 9 |   | 2        | 1 | 2              |
| 3 | 2 | 1 | 3 | 2 | 3 | 1        | 2        | 3 | 1 | 2 | 9 |   |          | 2 |                |
| 1 | 3 | 2 | 1 | 3 | 1 | 2        | 3        | 1 | 2 | 3 | 9 |   |          |   |                |
| 2 | 1 | 3 | 2 | 1 | 2 | 3        | 1        | 2 | 3 | 1 | 9 |   |          |   |                |
| 3 | 2 | 1 | 3 | 2 | 3 | 1        | 2        | 3 | 1 | 2 | 9 |   |          |   |                |
| 1 | 3 | 2 | 1 | 9 | 9 | 9        | 9        | 9 | 9 | 9 | 9 | 2 |          |   |                |
| 2 | 1 | 3 | 2 | 9 | 9 | 1        | 2        | 3 | 1 | 2 | 3 | 1 | 2        |   |                |
| 3 | 2 | 1 | 3 | 1 | 2 | 3        | 1        | 2 | 3 | 1 | 4 | 4 | 4        |   |                |
| 1 | 3 | 2 | 1 | 2 | 3 | 1        | 2        | 3 | 1 | 2 | 4 | 4 | Т        |   |                |
| 2 | 1 | 3 | 2 | 3 | 1 | 2        | 3        | 1 | 2 | 3 | 1 | 2 | 3        |   |                |
| 3 | 2 | 1 | 3 | 1 | 2 | 3        | 1        | 2 | 3 | 1 | 2 | 3 |          |   |                |

**Location Map** 

# Path Map

"Wave Propagation" from S (source)
With sequence "1,2,3,1,2,3,..."
Until reached T (sink)

# Loop2 >> Step 2: Retrace Path Map



|   | 1 | 3 | 1 | 2 | 1 | 3 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|   | 3 | 2 | 3 | 1 | 3 | 2 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 |
|   | 2 | 1 | 4 | 4 | 4 | 4 | 3 | 1 | 2 | 3 | 9 | 9 | 9 | 9 | 1 | 2 |
| V | 1 | 3 | 4 | 4 | 4 | 4 | 2 | 3 | 1 | 2 | 9 | 9 | 9 | 9 | 2 | 3 |
| A | 3 | 2 | 4 | 4 | S | 4 | 1 | 2 | 3 | 1 | 2 | 9 | 2 | 1 | 3 | 1 |
|   | 2 | 1 | 3 | 2 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 9 |   | 2 | 1 | 2 |
|   | 3 | 2 | 1 | 3 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 9 |   |   | 2 |   |
|   | 1 | 3 | 2 | 1 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 9 |   |   |   |   |
|   | 2 | 1 | 3 | 2 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 9 |   |   |   |   |
|   | 3 | 2 | 1 | 3 | 2 | 9 | 9 | 9 | 9 | 9 | 9 | 9 |   |   |   |   |
|   | 1 | 3 | 2 | 1 | 9 | 9 | 2 | 3 | 1 | 2 | 3 | 1 | 2 |   |   |   |
|   | 2 | 1 | 3 | 2 | 9 | 9 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 |   |   |
|   | 3 | 2 | 1 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 4 | 4 | 4 |   |   |
|   | 1 | 3 | 2 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 4 | 4 | Т |   |   |
|   | 2 | 1 | 3 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |   |   |
|   | 3 | 2 | 1 | 3 | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |   |   |   |

**Location Map** 

#### Path Map

## **Retrace Priority**

- 1) Down (y+1)
- 2) UP (y-1)
- 3) Right (x+1)
- 4) Left (x-1)

"Retrace" from T (sink)
Based on **retrace priority** to discover right back-trace sequence, i.e.
"3,2,1,3,2,1..." or "2,1,3,2,1,3..." or
"1,3,2,1,3,2..." Until reached S (source)

Loop2 >>Step 3: Update Location Map



All targets find path



# Loop2 >>Step 4: Calculate Path Cost

Source Sink Source Sink



**Location Map** 



Weight Map (store in DRAM)

Accumulate weight on the path from Source to Sink. Ex:8+8+5+6+7+8+8+8+8+8+8+7+6+5+8+8+8+8+8+8=156Record it as target #2 path cost. (Total cost += target #2 cost )

**Step 5:** Write Back & Output



After processing one location map and all data is written back to DRAM, **busy** would be pull low so that pattern could check answer in DRAM. Note that routing result could be different with TA's method, only you reach the routing goal, you can pass.

Here is some check point in routed result:

#### (1) Consistent Check:

Check original target is consistent with original location map store in DRAM.



All Macros should be place at original location

## (2) Connectivity Check

Check routed location map store in DRAM to see if source can reach sink or not.



## (3) Cost Check

Check weight output is correct with your routed location map in DRAM



Calculating golden total weight in pattern, compare to MRA.v output

Note that only if (1) pass, then will check (2), then (3). Any violation of rules will fail this Midterm.

Now, we need to formulate this problem more specifically with well-define SPEC. Hence, let me introduce definition/SPEC precisely for in this midterm project.

#### **SPEC for Test Case Generation**

- 1. One Location map called a Frame, with 64 by 64 4-bit array
- 2. Each Frame has its Frame\_id, identify the address store in DRAM
- 3. Location map include several routing Target, # of targets would range in 1~15
- 4. Each Target owns its NET\_ID, NET\_ID would range from 1~15
- 5. One Target consisted of 2 Macro, one is Source, the other is Sink
- 6. Macro height and width would be ranged in 2~6 if # of targets is less than 11, otherwise would be 2~4
- 7. Macro has one Terminal, which must be located at the outermost region of the Macro
- 8. Location of Terminal would be sent by input loc\_x and loc\_y, first is Source, followed by Sink
- Location map records all Macro locations, identified by a 4-bit value NET\_ID, while 0 represents empty region
- 10. Margin of Location map would be the outermost 2 rows and 2 columns
- 11. Macro would never be placed at the Margin area
- 12. Target is routed means only one path is highlighted with NET\_ID from Source Terminal to Sink Terminal in Location map
- 13. Length means path grid number from Source Terminal to Sink Terminal exclusive itself when Target is routed
- 14. Length of each Target is limited within 1000 units, i.e. no case over 1000 units
- 15. Weight means path weighted sum from Source Terminal to Sink Terminal exclusive itself when Target is routed
- 16. Location map routing success means all Target is routed in Location map (not unique solution)
- 17. Location map must be routing success with given approach
- 18. Cost of routing result means the accumulation Weight when Location map routing success



#### **System Architecture**



First, pattern will give the **frame\_id** of Location Map and Weight in DRAM, also send the corresponding **net\_id**, source location and sink location with the **loc\_x**, **loc\_y** in a row when **in\_valid** is high. Note that the number of targets will not be provided. It depends on how

many **net\_id** you get when in\_valid is high. Then, your chip should fetch frame data stored in DRAM via AXI-4. While your design is dealing with current task, **busy** should be high. After all processing is done, and all data is written back to DRAM, **busy** should be pull low so that pattern could check the answer in DRAM and output cost is consistent with golden one. On the other hand, pattern can directly access DRAM for data checking and overwriting without any constrain. In this project, all frames are 64 x 64, 4-bit net id location map, and all Weight Maps are 64 x 64, 4-bit integer. The DRAM behavior model (pseudo\_DRAM.v) is provided by TA.

# **Memory Mapping**

Example of data in memory:



Since all 16 frames are 64 x 64, 4-bit location map, we need 2048 bytes to store single frame. Each address in DRAM can store 1 byte (8 bits). Thus, the address of frame No.0 is range from 0x0001\_0000 to 0x0001\_07FF, No.1 is ranged from 0x0001\_0800 to 0x0001\_0FFF and so on. Every frame is **in raster scan order** in DRAM, each byte in DRAM represent 2 position value, where [3:0] means 1<sup>st</sup> element, and [7:4] means 2<sup>nd</sup> element.

#### **Mapping Example:**

For example,  $0x0001\_0000$  stores the (0, 0) [3:0] and (1,0) [7:4] value of Frame NO.0,  $0x0001\_0001$  stores the (2, 0) [3:0] and (3, 0) [7:4] of Frame NO.0.

#### Variable in pseudo\_DRAM.v

reg [7:0] DRAM\_r [0:196607]; (Address from 000000000 to 0002FFFF)



As for Weight Map, all of them are 64 x 64, 4-bit integer, and also in raster scan order in DRAM. The address of Weight No.0 is range from 0x0002\_0000 to 0x0002\_07FF, No.1 is ranged from 0x0002\_0800 to 0x0002\_0FFF and so on.

#### **Fetch DRAM:**

Given **frame\_id** = 5, that means your need to fetch DRAM  $0x0001_2800 \sim 0x0001_2FFF$  for Location map and  $0x0002_2800 \sim 0x0002_2FFF$  for Weight map with AXI4 protocol.

Note that Memory Address is Byte Offset as shown in above example. Each address is mapped to single byte.

You should **generate the SRAM** to store the required data. TA will check if you truly use memory, and if **NOT**, you will **Fail** the demo.

Please generate the SRAM in the lab5 folder, then copy them into the 04\_MEM folder in the Midterm\_Project folder.

#### **Control Interface**

#### **Inputs**

- 1. You will receive 1 number **frame\_id** [4:0] which is the index of both Location Map and Weight Map.
- 2. You will receive several numbers of **net\_id** [3:0] which identify (1) NET\_ID of target that stores in Location Map, (2) Routing Sequence. One **net\_id** will show up 2 cycles.
- 3. Numbers targets would not be given. It is based on how many **net\_id** you received when in\_valid is high.
- 4. For each target, you will receive 2 numbers of **loc\_x**, **loc\_y** [5:0] which identify (1) Source location of target at 1<sup>st</sup> cycle, (2) Sink location of target at 2<sup>nd</sup> cycle.
- 5. Input signal **frame\_id**, **net\_id**, **loc\_x**, **loc\_y** are valid when **in\_valid** is high, otherwise would be unknown value.
- 6. There is **only 1 reset** before the first pattern, thus, your design must be able to reset automatically.
- 7. All inputs will be changed at clock *negative* edge.
- 8. The next input pattern will come in 3 cycles after **busy** falls.

| Name     | Width | Functional Description                                                                                      |
|----------|-------|-------------------------------------------------------------------------------------------------------------|
| rst_n    | 1     | Asynchronous reset and active-low                                                                           |
| clk      | 1     | Clock for MRA chip curity                                                                                   |
| in_valid | 1     | When in_valid is high, all the other three inputs are valid.                                                |
| frame_id | 5     | Index of Location/Weight map, range in No. 0 ~ No. 31                                                       |
| net_id   | 4     | NET ID that stores in Location map, one location map may contain 1~15 targets, so net_id will range in 1~15 |
| loc_x    | 6     | Indicate location (x coordinate) of either source or sink terminal for routing target, range in 0~63        |
| loc_y    | 6     | Indicate location (y coordinate) of either source or sink terminal for routing target, range in 0~63        |



Fig 1. Input waveform example

#### **Outputs**

- 1. Your routed result should be written back to DRAM. After busy is pulled low, pattern will check the correctness of the value inside DRAM.
- 2. All outputs are synchronized at clock **positive** edge.
- 3. **busy** should be low after initial reset.
- 4. **busy** should not be raised when **in\_valid** is high.

The test pattern will check whether your data in DRAM and **cost** is correct or not at the first clock *negative* edge after busy pulled low.



Fig 2. Output waveform example

#### **AXI4 IO**

Please refer to AXI4\_Introduction

#### **Specifications**

- 1. Top module name: MRA(File name: MRA.v)
- 2. It is an asynchronous reset and active-low architecture.
- 3. All the data are in **unsigned** format.
- 4. The total cell area should not larger than  $2,500,000 \mu m^2$ . The latency of your design in each pattern should not be larger than 1,000,000 cycles.
- 5. The maximum clock period is set to 15 ns, and you can determine the clock period by yourself.
- 6. The input delay and the output delay should be **half** of the clock period. For example, if clock period is 10ns, the input delay and the output delay will be 5ns.
- 7. The output loading is set to 0.05.
- 8. The synthesis result of data type cannot include any **LATCH** (check in syn.log).
- 9. The slack in the end of MRA.timing should be **non-negative** and the result should be **MET**.
- 10. Since memories are used in this project, the syn.tcl and .synopsys\_dc.setup in Lab05 may be a good reference one.

- 11. Note that maximum synthesis time could not be over 2 hours (Normally, this design would not over 2 hours)
- 12. Memory name should follow the rule in Lab05

13. No **ERROR** is allowed in every simulation/synthesis.

#### **Grading Policy**

1. The performance is determined by latency and cycle time of your design. The smaller the performance index is, the higher score you can get.

Score = SampleCase(20%) + Functionality(50%)+ Performance(30%)

SampleCase (20%): 2 simple case, 2 medium case and 2 hard case

Functionality(50%): Random test

**Performance**(30%): Total Latency<sup>2</sup> x Cycle Time

- 2. Note that cell area must within 2,500,000 um^2, or you will fail
- 3. Performance points can be obtained when both Sample Case and Functionality pass
- 4. 2<sup>nd</sup> demo will get 30% off of the total score.

TA will demo your design with the DRAM latency ranging from 300 to 500, not fixed.

#### Note

1. Please submit your files under 09\_SUBMIT before 12:00 at noon:

Due Day: 1st Demo: 04/14 2nd Demo: 04/16

If uploaded files violate the naming rule, you will get 5 deduct points

 In this lab, you can adjust your clock cycle time. Consequently, make sure to key in your clock cycle time after the command like the figure below. It means that the TA will demo your design under this clock cycle time:

```
Exercise/09_SUBMIT]$ ./00_tar 15.0
```

• After that, you should check the following files under

09\_SUBMIT/Midterm\_Project\_iclabXXX/

**RTL design:** MRA\_iclab???.v, (??? is your account number)

**Memory File: MEMORY\_NAME\_iclab???.v** 

MEMORY\_NAME\_iclab???.db

File List for Customized SRAM: filelist\_iclab???.f

Clock period: clock\_period\_iclab???.txt (ex: 20.0\_iclab099.txt)

If you miss any files on the list, you will fail this lab.

Then use the command like the figure below to check if the files are uploaded or not.

# [Exercise/09\_SUBMIT]% ./02\_check 1st\_demo

## **Example:**

You will submit MRA\_iclab099.v, 20.0\_iclab099.txt

Given your memory name is RA1SH512, and your submitted memory file

RA1SH512 iclab099.v, RA1SH512 iclab099.db, filelist iclab099.f

And you modified in filelist\_iclab099.f

../04 MEM/RA1SH512.v

(Note that in filelist\_iclab099.f, no \_iclab099 after your module name)

Besides, you can provide **multiple** memory specs in this lab.

If the uploaded files violating the naming rule, you will get **5 deduct points**.

If you omit any file, you will fail demo.

2. Template folders and reference commands:

01\_RTL/ (for RTL simulation)

./01\_run\_vcs\_rtl

./01 run vcs rtl FUNC

./01 run vcs rtl SAMPLE

02\_SYN/ (for Synthesis)

./01\_run\_dc\_shell

Remember modify .tcl to fit your memory db name (refer to Lab05)

(Check the design which contains **Latch** and **Error** or not in **syn.log**)

(Check the design's timing in /Report/MRA.timing to see if the slack is **MET**)

03\_GATE\_SIM/ (Gate Level simulation) ./01\_run\_vcs\_gate

You can key in ./09 clean up to clear all log files and dump files in each folder

04\_MEM/ (Memory location)

You should generate your memories and put the required files (.v and .db) here.

09 SUBMIT/ (submit your files) ./00 tar ./01 submit ./02 check

You should make sure the clock period values are identical in

00\_TESTBED/PATTERN.v and 02\_SYN/syn.tcl

#### **Sample Case**

#### 3 Cases: Simple, Medium, Hard

- \_0: Simple
- \_1: Medium
- 2: Hard
- 00 TESTBED/DRAM/

#### dram.dat

You can write a pattern to read this file and send to chip

00\_TESTBED/TEST\_CASE/

#### Input.txt

You can write a pattern to read this file and send to chip

## Map.txt, Output.txt, Weight.txt

Provide for you to debug

#### 00 TESTBED/Picture/

#### \*.png

Provide for you to debug



#### Hint

## 1. Recommend development flow:

- I. Establish a PATTERN for testing your circuit
- II. Ensure your design can fetch data from DRAM with AXI4
- III. Ensure your design can write back to DRAM with AXI4
- IV. Design the Memory usage of whole algorithm
- V. Design the Architecture of circuit and FSM for each part
- VI. Implementation of algorithm  $(01 \rightarrow 02 \rightarrow 03)$
- VII. Verify circuit functionality with more test case
- VIII. Find out bottleneck and do some optimization
- IX. Optimize latency Algorithm and Architecture
- X. Optimize area Memory and Circuit Design
- XI. Optimize routed cost Algorithm (optional)

#### 2. TA provide a pseudo code for those not familiar with this algorithm

#### Algorithm 1 Filling Path Map

```
1: Traverse Sequence = 1,2,3,1,2,3,...
2: CLEAN node list
3: PUSH source into node list
4: while node list is not empty do
      POP node list as current node
      for neighbor is Down, Up, Right, Left Node do
6:
7:
          if neighbor is sink then
             current \ node = neighbor
             break while loop
9:
          else if neighbor is inside region, not block, not traversed then
10:
11:
             TRAVERSED neighbor (Update Path Map)
             PUSH neighbor into node list
12:
13:
          end if
      end for
14:
15: end while
16: if current node is not sink then
17:
      ERROR
18: end if
```

#### Algorithm 2 Retrace Path Map

```
1: Traverse Sequence = 3,2,1,3,2,1,...
2: current \ node = sink
3: while current node is not source do
       for neighbor is Down, Up, Right, Left Node do
          {f if} neighbor is right Traverse Sequence and inside region {f then}
 5:
              Update Location Map
 6:
 7:
              current \ node = neighbor
              break For loop
 8:
          end if
 9:
       end for
10:
11: end while
```